DataFrame
二维数据结构,即数据以行和列的表格方式排列
语法:
pandas.DataFrame(data, index, columns, dtype, copy)
| 参数 | 说明 |
|---|---|
| data | 各种形式的数据,ndarray、list、constants |
| index | 索引值必须是惟一的和散列的,与数据的长度相同.如果没有索引被传递则默认为 np.arange(n) |
| dtype | 指定数据类型,如果没有,那么将自动推断数据类型 |
| copy | 是否复制数据,默认是 False |
1> 创建空的 DataFrame
创建空的 DataFrame
df = pd.DataFrame()
print(f'空的 DataFrame: \n{df}')
# 输出结果:
# 空的 DataFrame:
# Empty DataFrame
# Columns: []
# Index: []
2> 使用列表创建DataFrame
使用单个列表或嵌套列表创建DataFrame
data = [1, 2, 3, 4, 5]
df = pd.DataFrame(data)
print(f'单个列表创建 DataFrame: \n{df}')
# 输出结果:
# 单个列表创建 DataFrame:
# 0
# 0 1
# 1 2
# 2 3
# 3 4
# 4 5
data = [['xiao meng', 20],['xiao zhi', 21],['xiao qiang', 23]]
df = pd.DataFrame(data, columns = ['name','age'], dtype = float)
print(f'嵌套列表创建 DataFrame:\n{df}')
# 输出结果:
# 嵌套列表创建 DataFrame:
# name age
# 0 xiao meng 20.0
# 1 xiao zhi 21.0
# 2 xiao qiang 23.0
3> 用字典创建 DataFrame
data = {'Name':['xiao meng','xiao zhi','xiao qiang','xiao wang'],'Age':[20, 21, 23, 22]}
df = pd.DataFrame(data,index = ['rank1', 'rank2', 'rank3', 'rank4'])
print(f'用字典创建 DataFram:\n{df}')
# 输出结果:
# 用字典创建 DataFram:
# Name Age
# rank1 xiao meng 20
# rank2 xiao zhi 21
# rank3 xiao qiang 23
# rank4 xiao wang 22
4> 用字典列表创建 DataFrame
data = [{'a':1,'b':3},{'a':4,'b':10,'c':8}]
df_1 = pd.DataFrame(data, index = ['first','second'],columns = ['a','b'])
print(df_1)
# 输出结果:
# a b
# first 1 3
# second 4 10
df_2 = pd.DataFrame(data, index = ['first','second'],columns = ['a','b1'])
print(df_2)
# 输出结果:
# a b1
# first 1 NaN
# second 4 NaN
5> 使用系列的字典创建 DataFrame
dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d'])}
df = pd.DataFrame(dict_v)
print(df)
# 输出结果:
# one two
# a 1.0 1
# b 2.0 2
# c 3.0 3
# d NaN 4
6> 列选择
dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d'])}
df = pd.DataFrame(dict_v)
print(df['one'])
# 输出结果:
# a 1.0
# b 2.0
# c 3.0
# d NaN
# Name: one, dtype: float64
7> 列添加
dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d'])}
df = pd.DataFrame(dict_v)
df['three'] = pd.Series([10, 20, 30],index = ['a','b','c'])
print(f'根据传递的系列添加新列:\n{df}')
# 输出结果:
# 根据传递的系列添加新列:
# one two three
# a 1.0 1 10.0
# b 2.0 2 20.0
# c 3.0 3 30.0
# d NaN 4 NaN
df['four'] = df['one'] + df['three']
print(f'使用存在的数据添加新列:\n{df}')
# 输出结果:
# 使用存在的数据添加新列:
# one two three four
# a 1.0 1 10.0 11.0
# b 2.0 2 20.0 22.0
# c 3.0 3 30.0 33.0
# d NaN 4 NaN NaN
8> 列删除
dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d']),'three':pd.Series([10, 20, 30],index=['a','b','c'])}
df = pd.DataFrame(dict_v)
print(f'初识DateFrame:\n{df}')
# 输出结果:
# 初识DateFrame:
# one two three
# a 1.0 1 10.0
# b 2.0 2 20.0
# c 3.0 3 30.0
# d NaN 4 NaN
del df['one']
print(f'使用删除函数删除第一列:\n{df}')
# 输出结果:
# 使用删除函数删除第一列:
# two three
# a 1 10.0
# b 2 20.0
# c 3 30.0
# d 4 NaN
df.pop('two')
print(f'使用 pop 函数删除一列:\n{df}')
# 输出结果:
# 使用 pop 函数删除一列:
# three
# a 10.0
# b 20.0
# c 30.0
# d NaN
9> 行选择、添加和删除
# 通过行标签选择行
dict_v = {'one':pd.Series([1, 2, 3], index = ['a','b','c']),'two':pd.Series([1, 2, 3, 4],index=['a','b','c','d']),'three':pd.Series([10, 20, 30],index=['a','b','c'])}
df = pd.DataFrame(dict_v)
print( df.loc['b'])
# 输出结果:
# one 2.0
# two 2.0
# three 20.0
# Name: b, dtype: float64
# 通过将整数位置传递给 iloc()函数选择行
print(df.iloc[2])
# 输出结果:
# one 3.0
# two 3.0
# three 30.0
# Name: c, dtype: float64
10> 行切片
df1 = pd.DataFrame([[1, 2],[3, 4]], columns = ['a', 'b'])
df2 = pd.DataFrame([[5, 6],[7, 8]], columns = ['a', 'b'])
df = df1.append(df2)
print(f'原始数据:\n{df}')
# 输出结果:
# 原始数据:
# a b
# 0 1 2
# 1 3 4
# 0 5 6
# 1 7 8
df = df.drop(0)
print(f'删除 行标签后:\n{df}')
# 输出结果:
# 删除 行标签后:
# a b
# 1 3 4
# 1 7 8